Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH action to generate report #199

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

GH action to generate report #199

wants to merge 10 commits into from

Conversation

asmacdo
Copy link
Member

@asmacdo asmacdo commented Sep 25, 2024

Fixes #177

Step 1: Create Skeleton

  • Authenticate with AWS
  • Connect to K8s cluster
  • deploys our job-runner pod onto a Karpenter NodeClaim
  • Creates a SPOT node as needed
  • Run dummy job
  • Delete Pod
  • Scale Down

I've verified that when a user-node is available (created by running a tiny jupyterhub), the job pod schedules on that node. I then shut down my jupyterhub and all user-nodes scaled down. I reran this job, and Karpenter successfully scaled up a new spot node, the pod was scheduled on it, ran successfully, was deleted, and the node cleaned up. Step 1 complete!

Step 2 Generate Report

  • Connect Pod to EFS
  • List users
  • du each user
  • du shared
  • collate data into report
  • Double Check that nodes come up and down successfully
  • Run job several times in 1 day, check next day for EFS usage spike (IIUC we should be fine because EFS is Bursting mode)

Step 3 Push Report

  • Create private GitHub repository to store reports
  • Configure bot permission to push to repo
  • push report to repo on complete

Questions to answer:

  • If a SPOT node is preempted, can we redeploy again later?

Comment on lines +4 to +6
pull_request:
branches:
- main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also run this on a weekly basis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's on PR push just so I can test easily, but yes, 1/week sounds good to me. @kabilar Do you have a preference for what day/time?

Copy link
Member

@kabilar kabilar Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you. How about Mondays at 6am EST? We can then review the report on Monday mornings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

User data quota cron job
2 participants